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The importance of the ability of predict trends in social media has been growing rapidly in the past 
few years with the growing dominance of social media in our everyday's life. Whereas many works 
focus on the detection of anomalies in networks, there exist little theoretical work on the prediction 
of the likelihood of anomalous network pattern to globally spread and become "trends". In this 
work we present an analytic model the social diffusion dynamics of spreading network patterns. 
Our proposed method is based on information diffusion models, and is capable of predicting future 
trends based on the analysis of past social interactions between the community's members. We 
present an analytic lower bound for the probability that emerging trends would successful spread 
through the network. We demonstrate our model using two comprehensive social datasets — the 
Friends and Family experiment that was held in MIT for over a year, where the complete activity of 
130 users was analyzed, and a financial dataset containing the complete activities of over 1.5 million 
members of the eToro social trading community. 

I. INTRODUCTION 

We live in the age of social computing. Social networks are everywhere, exponentially increasing in volume, and 
changing everything about our lives, the way we do business, and how we understand ourselves and the world around 
us. The challenges and opportunities residing in the social oriented ecosystem have overtaken the scientific, financial, 
and popular discourse. 

In this paper we study the evolution of trend spreading dynamics in social networks. Where there have been 
numerous works studying the topic of anomaly detection in networks (social, and others), literature still lacks a 
theoretic model capable of predicting how do network anomalies evolve. When do they spread and develop into global 
trends, and when they are merely statistical phenomena, local fads that get quickly forgotten? We give an analytically 
proven lower bound for the spreading probability, capable of detecting "future trends" - spreading behavior patterns 
that are likely to become prominent trends in the social network. 

We demonstrate our model using social networks from two different domains. The first is the Friends and Family 
experiment [l[, held in MIT for over a year, where the complete activity of 130 users was analyzed, including data 
concerning their calls, SMS, MMS, GPS location, accelerometer, web activity, social media activities, and more. The 
second dataset contains the complete financial transactions of the eToro community members - the world's largest 
"social trading" platform, allowing users to trade in currency, commodities and indices by selectively copying trading 
activities of prominent traders. 

The rest of the paper is organized as follows : Section |TT] discusses related works. The information diffusion 
model is presented in Section IIIII and its applicability is demonstrated in Section HVi and concluding remarks 
are given in Section fVl 

II. RELATED WORK 



Diffusion Optimization. Analyzing the spreading of information has long been the central focus in the study of 
social networks for the last decade [2|] Researchers have explored both the offline networks structure by asking 
and inccntivizing users to forward real mails and E-mails [J], and online networks by collecting and analyzing data 
from various sources such as Twitter feeds Q. 

The dramatic effect of the network topology on the dynamics of information diffusion in communities was demon- 
strated in works such as @. One of the main challenges associated with modeling of behavioral dynamics in 
social communities stems from the fact that it often involves stochastic generative processes. While simulations on 
realizations from these models can help us explore the properties of networks [8| , a theoretical analysis is much more 
appealing and robust. In this work we present results are based on a pure theoretical analysis. 

The identity and composition of an initial "seed group" in trends analysis has also been the topic of much research. 
Kempe et al. applied theoretical analysis on the seeds selection problem [§] based on two simple adoption models: 
Linear Threshold Model and Independent Cascade Model. Recently, Zaman et al. developed a method to trace rumors 
back in the topological spreading path to identify sources in a social network [l(|, and suggest such method can be 
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used to locate influencers in a network. Some scholars express their doubts and concerns for the influencer-driven 
viral marketing approach, suggesting that "everyone is an influencer" [Tl| , and companies "should not rely on it" [l2| . 
They argue that the content of the message is also important in determining its spreads, and likely the adoption 
model we were using is not a good representation for the reality. 

Our work, on the other hand, focuses on predicting emerging trends given a current snapshot of the network and 
adoption status, rather than finding the most influential nodes. We provide a lower bound for the probability that 
an emerging trend would spread throughout the network, based on the the analysis of the diffusion process outreach, 
which is largely missing in current literature. 

Adoption Model. A fundamental building block in trends prediction that is not yet entirely clear to scholars is 
the adoption model, modeling individuals' behavior based on the social signals they are exposed to. Centola has 
shown both theoretical and empirically that a complex contagion model is indeed more precise for diffusion (l3l . 
Il4j |. Different adoption models can dramatically alter the model outcome 15 1. In fact, a recent work on studying 



mobile application diffusions using mobile phones demonstrated that in real world the diffusion process is a far more 
complicated phenomenon, and a more realistic model was proposed in [l6| . Our results also incorporate this realistic 
diffusion model. 

Trends Prediction and Our Proposed Model. In this work we study the following question : Given a snapshot 
of a social network with some behavior occurrences (i.e. an emerging trend), what is the probability that these 
occurrences (seeds) will result in a viral diffusion and a wide-spread trend (or alternatively, dissolve into oblivious). 
Though this is similar to the initial seed selection problem [9[, we believe that the key factor to succeed in a viral 
marketing campaign optimization is a better analytical model for the diffusion process itself. 

The main innovation of our model is the fact that it is based on a fully analytical framework with a scale-free network 
model. Therefore, we manage to overcome the dependence on simulations for diffusion processes that characterizes 
most of the works in this field [H, IT?} ■ We are able to do so by decomposing the diffusion process to the transitive 
random walk of "exposure agents" and the local adoption model based on [lj|. While there are some works that 
analyze scale- free network [181 ] most of them come short to providing accurate results, due to the fact that they 
calculate the expected values of the global behaviors dynamics. Due to strong "network effect" however, many 
real world networks display much less coherent patterns, involving local fluctuations and high variance in observed 
parameters, rendering such methods highly inaccurate and sometimes impractical. Our analysis on the other hand 
tackle this problem by modeling the diffusion process on scale-free networks in a way which takes into account such 
interferences, and can bound their overall effect on the network. 



III. TREND PREDICTION IN SOCIAL NETWORKS 



One of the main difficulty of trends-prediction stems from the fact that the first spreading phase of "soon to be 
global trends" demonstrate significant similarity to other types of anomalous network patterns. In other words, given 
several observed anomalies in a social network, it is very hard to predict which of them would result in a wide-spread 
trend and which will quickly dissolve into oblivious. 

We model the community, or social network, as a graph G, that is comprised of V (the community's members) 
and E (social links among them). We use n to denote the size of the network, namely |V|. In this network, we 
are interested in predicting the future behavior of some observed anomalous pattern a. Notice that a can refer to a 
growing use of some new web service such as Groupon, or alternatively a behavior such as associating oneself with 
the tl 99% movement". 

Notice that "exposures" to trends are transitive. Namely, an "exposing" user generates "exposure agents" which 
can be transmitted on the network's social links to "exposed users" , which can in turn transmit them onwards to 
their friends, and so on. We therefore model trend's exposure interactions as movements of random walking agents 
in a network. Every user that was exposed to a trend a generates j3 such agents, on average. 

We assume that our network is (or can be approximated by) a scale free network G[n, c, 7), namely, a network of 
n users where the probability that user v has d neighbors follows a power law : 

P(d) ~ c • eT 7 

We also define the following properties of the network : 

Definition 1. Let V a (t) denote the group of network members that at time t advocate the behavior associated with 
the potential trend a. 

Definition 2. Let us denote by /3 > the average "diffusion factor" of a trend a. Namely, the average number of 
friends a user who have been exposed to the trend will be talking about the trend with ( or exposing the trend in other 
ways). 
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Definition 3. Let Pa be defined as the probability that two arbitrary members of the network vertices have degrees 
ratio of A or higher : 

P A = Prob [deg(u) > A • deg(v)] 
Definition 4. We denote by ct_ the "low temporal resistance" of the network : 



Vt, A t , tr_ = max 1 1 < A 



l-e-A-^™. (1 _p A ) 

Definition 5. Let PLocai-Adopt{o-,v,t, A t ) denote the probability that at time t + A t the user v had adopted trend a 
(for some values oft and At). This probability may be different for each user, and may depend on properties such as 
the network's topology, past interactions between members, etc. 

Definition 6. Let Phocai denote that expected value of the local adoption probability throughout the network : 

P L ocal= E [P L 

ocal — Adopt (a,u,t,A t )) 

Definition 7. Let us denote by PrrendiAt, Va ^' , e) the probability that at time t + A t the group of network members 
that advocate the trend a has at least e ■ n members (namely, that \V a (t + A t )\ > e ■ n). 

We assume that the seed group of members that advocate a trend at time t is randomly placed in the network. 
Under this assumption we can now present the main result of this work : the lower bound over the prevalence of an 
emerging trend. Note that we use Phocai- Adopt &s a modular function in order to allow future validation in other 
environments. The explicit result is given in Theorem [2] 

Theorem 1. For any value of A t , \V a (t)\, n, e, the probability that at time t + A t at least e portion of the network's 
users would advocate trend a is lower bounded as follows : 



Trend 



'p_(l - pi 



where : 



id where 



P_ = e 



2 Popt_ + 2A*-cr 



and provided that : 



popt- < A t ■ cr_ 

and as cr_ depends on Pa, using the following bound : 

c 2 • A 1 -'' 

Vv,u(EV , P A < 



27 2 -37+1 

Proof. See Appendix for the complete proof of the Theorem 



□ 



Recent studies examined the way influence is being conveyed through social links. In [16[ the probability of network 
users to install applications, after being exposed to the applications installed by the friends, was tested. This behavior 
was shown to be best modeled as follows, for some user v : 

PLocai~Ado P t(a, v, t, At) = 1 - e -(s»+P»M) (1) 

Exact definitions and methods of obtaining the values of s v and w v u can be found in (l6j . The intuition of these 
network properties is as follows : 



4 



For every member v G V, s v > captures the individual susceptibility of this member, regardless of the specific 
behavior (or trend) in question. p a (v ) denotes the network potential for the user v with respect to the trend a, and is 
defined as the sum of network agnostic "social weights" of the user v with the friends exposing him with the trend a. 

Notice also that both properties are trend-agnostic. However, while s v is evaluated once for each user and is 
network agnostic, p a (v) contributes network specific information and can also be used by us to decide the identity of 
the network's members that we would target in our initial campaign. 

Using Theorem [T] we can now construct a lower bound for the success probability of a campaign, regardless of the 
specific value of p : 

Theorem 2. For every A t , \V a (t)\, n, e, the probability that at time t + A t at least e portion of the network's users 
advocate the trend a is : 



where : 



Trend 



A, 



\Va(t)\ 



> e 



1- $ K/n- 



e-P- 



P-(l-P-) 



P_ = e 



, if"- , ''OP 

i. 2 Popt_ + 2A + 



and where 



popt- = argmm 
p 



and provided that : 

p opt _ < A t ■ <7- 

and where £g denotes the network's adoption factor and £n denotes the network's influence factor : 



Proof. See complete proof in the Appendix. 



6v = e 



□ 



IV. EXPERIMENTAL RESULTS 



We have validated our model using two comprehensive datasets, the Friends and Family dataset that studied the 
casual and social aspects of a small community of students and their friends in Cambridge, and the eToro dataset — 
the entire financial transactions of over 1.5M users of a "social trading;; community. 

The datasets were analyzed using the model given in [l6j |. based on which we have experimentally calculated the 
values of /3, £,n and cr_ . 

Figures [T]and [2] demonstrate the probabilistic lower bound for trend emergence, as a function of the overall penetra- 
tion of the trend at the end of the time period, under the assumption that the emerging trend was observed in 5% of 
the population. In other words, for any given "magnitude" of trends, what is the probability that network phenomena 
that are being advocated by 5% of the network, would spread to this magnitude. Notice that although a longer 
spreading time slightly improve the penetration probability, the "maximal outreach" of the trend (the maximal rate 
of global adoption, with sufficient probability) is dominated by the topology of the network, and the local adoption 
features. 



V. CONCLUSIONS AND FUTURE WORK 



In this work we have discussed the problem of trends prediction, that is — observing anomelous network patterns 
and predicting which of them would become a prominent trend, spreading successfully throughout the network. 
We have analyzed this problem using information diffusion techniques, and have presented a lower bound for the 
probability of a pattern to become a global trend in the network, for any desired level of spreading. In order to model 
the local interaction between members, we have used the results from [l6| that studied the local social influence 
dynamics between members of social networks. 
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FIG. 1: Trends spreading potential in the eToro network, for various penetration rates. Initial seed group is denned as 5% of 
the population. Each curve represents a different time period, from 2 weeks to 6 weeks. 
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FIG. 2: Trends spreading potential in the Friends and Family network, for various penetration rates. Initial seed group is 
defined as 5% of the population. Each curve represents a different time period, from 2 weeks to 5 weeks. 



Though our work provides a comprehensive theoretical framework to understand trends diffusion in social networks, 
there are still a few challenges ahead. For example, we wish to extend our model to other network models such as 
Erdos-Renyi random networks, as well as Small World networks. This is essential as more evidences are suggesting 
that some communities involve complex structures that cannot be easily approximated using s simplistic scale-free 
model EH- 

In addition, our results can be used in order to provide answers to other questions, such as what is the optimal 
group of members that should be used as a "seed group" in order to maximize the effects of marketing campaigns. 
Another example might be finding changes in the topology of the social network that would influence the information 
diffusion progress in a desired way (either to encourage or surpass certain emerging trends). 

In order to achieve these goals we are planning a large-scale field test with a leading online social platform, that 
would give us access to collect more empirical supporting evidences, as well as conducting an active experiment in 
which we would try to predict trends in real time. 

Finally, we are interested in comparing the prediction obtained from our model with the actual semantics of the 
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trends, to better understand the connection between the trends semantics and the diffusion process they undergo. 
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Appendix 



Appendix A: Proof of Theorem 1 



In order to prove Theorem 1 we shall require the following definitions : 

Definition 8. LetAf v . a (t) denote the number of friends of user v that at time t are exposing v to the trend a (namely, 
the number of friends of v that at time t have been exposed to the trend a and are conveying this information to v). 

Note that "exposing" a neighbor to a trend does not necessarily mean advocating the same trend. 

Definition 9. Let us denote by P p -Trend(A t , p, ,e) the probability that at time t + A t at least e ■ n members of 

the network have been exposed to the trend a by at least p of their friends. 

In addition, we define p op t_ that is used in the Theorem : 

Definition 10. 

p opt _ = argmin ( P[2 al ■ P T rend ( A t , ,e 



We later see that the expression for PTrend would refer to p. Using popt_ we would later be able to omit this 
dependance. 

Theorem 1. 

For any value of A t , |V a (i)|, n, e, the probability that at time t + A t at least e portion of the network's users would 
advocate trend a is lower bounded as follows : 

p , ( A 1^)1 ^ >P- (i *( V^-(e^P-) 

^Trend A t , , £ > ^Local I 1 9 



where : 



and provided that : 



P_ = e 



A t -g_ p opt_ 
2 Popt_ + 2 & t -„_ 



p opt _ <Af<7- 

Proof. We first assess the number of "agents" residing in adjacent vertices from some vertex v at any given time : 
Lemma 1. Let v GV be an arbitrary user of the network G. Then : 

\Va(t)\- 



E v [N v , a {t + A t )-N v , a (t)]=vP At ■ 



C • V 7 



n 

Proof. We assume that the movement of the agents in the network are random [? ]. Hence : 

V?i € V(G) , E[number of agents residing on u] = — ^ ® 

At time t there are |14(t)| members of the network that advocate trend a. Those members generate on average (3 
"agents" that are sent along the social links to their friends, creating chains of length A t , and a total of /3 A * • |V^(£)| 
active agents. Incorporating this with the distribution of the degrees, the rest is implied. □ 

The following Lemma produces the probability that two arbitrarily selected vertices would have degrees which differ 
in more than a certain threshold : 

Lemma 2. 

c 2 • A 1-7 
W < Mey ■ P ^ 2 72 -3 7+ l 
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Proof. By definition : 

P A = Prob [deg{u) > A • deg(v)} < 

/OO POO 
/ (Prob \deg(v) — j] ■ Prob [deg(u) = m ■ j) dm) dj 

As the network is scale free, we can write : 

P A = Prob [deg(u) > A • deg(v)} < 



OO /"OO 



1 JA 



(c • j 7 • c • (to • j) 1 drn) dj < 



°° / C 2 • J" 27 • TO~ 7+1 



1-7 



dj< 



, dj - " (i-2 7 )(i- 7 ) 



< 



c 2 • A 1-7 c 2 ■ A 1 " 7 

< 



(2 7 -l)( 7 -l) " 2 7 2 - 3 7 + l 



□ 



Lemma 3. For any member v £ V at time t + A t , the probability that v will be exposed at the next time-step to the 
trend a is at least <r_ . 

Proof. The probability that an agent located on a vertex u such that (m, v) G E will move to v at the next time-step is 
de g^ ■ Therefore, remembering that v has Af v , a (t) agents that resides in neighboring vertices in time t, the probability 
that v will not be exposed to the trend at the next time-step is : 



1 - 



deg(u) 



A/-„, a (t) 



(Al) 



Using the well known inequality (1 — x) < e x for x < 1, and taking into account Lemma [TJ we can bound Equation 
IA1I from above by : 



1 



1 



Af„,a(t) desQ) g A t -|V a (t)| 



deg(u) 

Using Lemma [2] we can further simplify Equation IA2I as follows : 

At -|Vq(t)l 



Prob 



deg(v) g"t-|V a (t)| 
g deg(u) n 



< e" 



> (1 - ^a) 



(A2) 



(A3) 



Therefore, the probability that a user will be exposed to the trend in the next time step is at least 



1 - e 



a -Pa) 



which equals er_. 



□ 



We can now proceed to the calculation of P p -Trend- 
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Lemma 4. The probability that at time t + A t at least e ■ n members of the network have been exposed to a trend a 
by at least p of their friends is lower bounded as follows : 

P p -Trend(A t , p, , e) > 1 - $ f yfa ■ —J=£= ) 

where : 

p — g <■ 2 P+ 2A t . CT _ I 



and where <j>(a;) is the cumulative normal distribution function, defined as : 

i r x 12 

= -= / e" 2 "* dt 

V27T J-oo 

and also provided that : 

p < A t ■ er_ 

Proof. Using Lemma [3] we have a lower bound for the probability that a user v will be exposed to some trend a by 
an agent originated by one of the group of users that advocate the trend a at time t. This is in fact a Bernoulli trial 
with success probability of <r_ . 

Denoting X v (t) the number of times user v is being exposed to the trend a after t steps, we shall now use the 
negative variance Chernoff bound : 



P[X v (t) < (1 - S)t ■ cr_] < e- &2t ~^ 



Once selecting 8 — 1 — j^— and for the entire lifespan of the trend (namely, for t — At) we obtain the probability 
that a single (specific) member will be exposed to the trend a at least p times. For this, we shall first define : 



p A p , A \Va(t)\ _u 
P- = Pp-Trend(AT,p 7 ,71 ) 



which by definition implies : 



P_ = P[X V {A T ) <p]<e (1 2 < 



_f1_2 6 I £ 2 1 & T-o~ & T a- p 2 



As the Chernoff bound requires that 8 > we should make sure that : 

p < At ■ ct- 

As we want to bound the probability that at least e • n of the network members are exposed to the trend at least 
p times, we shall use the above as a success probability of yet a second Bernoulli trial. As n is assumed to be large, 
the number of exposed members can be approximated using Normal distribution : 

/a \V a (t)\ \ I e-n-n-P- \ 

P p - Trend A T ,p, l -^^,e) > 1 - $ = 

V n J \y/n.P-{l-P-)J 

and the rest is implied. □ 

Whereas Lemma [4] provides an estimation concerning the global outreach of trends in terms of exposure, it does not 
take into account the probability that users that are exposed to the trend by p of their friends, will actually adopt it. 
In order to do this, we need to incorporate Phocai-Adopt into Lemma 01 producing a combined bound for the global 
adoption of the trend. 
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Proposition 1. For any A t , \V a (t)\, n, e, the probability that at time t + A t at least e portion of the network's users 
advocate the legitimate trend a is : 

fTrend,p A t ,p, ,£ = ^Local ■ Fp-Trend \ A t , p, ■ 



Notice that p appears in the expression of Proposition Q] for mathematical reasons, and has no actual meaning. We 
omit the dependency of the expression on p, by finding the optimal value of p for every set of values of s, \V a (t)\ and 
A t , by assigning : 

P (A r\ - P (A 1^)1 
f Trend I ; £ I — £ Trend, p I tVtj Popt_ j • 



And the rest is implied. □ 



Appendix B: Proof of Theorem 2 

Let us remind once again the local influence model that was shown in [16j to best approximate the behavior diffusion 
in real world social networks : 

PLocai-Ado P t(a, v, t, A t ) = 1 - e-( s *+^» (Bl) 

We recall that s v > captures the individual susceptibility of this member, and that p a {v) denotes the network 
potential for the user v with respect to the trend a, and is defined as the sum of network agnostic ll social weights" of 
the user v with the friends exposing him with the trend a : 

Pa(v) = ^ W v,u 

(where A/"„ l0 is the overall group of users exposing v to the trend a). 

Using Theorem Q] we can now construct a lower bound for the success probability of a campaign, regardless of the 
specific value of p : 

Theorem 2. For every A t , |V^(t)|, n, e, the probability that at time t + A t at least e portion of the network's users 
advocate the legitimate trend a is : 



P-(l-P_), 



where : 



and where : 



and provided that : 



P_ = e 



f A t -tr_ p opt_ 
_ ~( 2 Popt_+ 2 & t . a _ 



p opt _ 4 argmin ( e — . P Tre „ d ^U)) 



Popt_ < A t • CT_ 

and where £g denotes the network's adoption factor and £jv denotes the network's influence factor : 
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Proof. From Equation IB1I we have : 

PLocai~Ado P t(a,v,t, A t ) = 1 - e -(--^.(t0) 
The expected value of the local adoption probability is therefore : 

E ueV [PLocal-A d o P t(u)] = - 1 - e- {S " + W^J^^, a «»..«) 
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vev 



(where Af v<a is the groups of user u's friends). 

Using the inequality (1 — x) < e~ x for x < 1, we see that 



£>ro 



pen 
Local — Adopt 
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Using the fact that an arithmetic mean is always greater than a geometric mean, Equation IB2I can be written as 
follows : 

^-A« tofrt <e- e ' E -^« ^ < (B3) 
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Integrating Equation IB3I with Proposition [1] and Lemma [4] the rest is implied. □ 



